Following a compiler change done in 2012, make use of the fact that for
non-zero input BSF and TZCNT produce the same numeric result (EFLAGS
setting differs), and that CPUs not knowing of TZCNT will treat the
instruction as BSF (i.e. ignore what looks like a REP prefix to them).
The assumption here is that TZCNT would never have worse performance
than BSF.
Also extend the asm() input in find_first_set_bit() to allow memory
operands.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
" je 2f\n\t"
" xor -"STR(BITS_PER_LONG/8)"(%2),%3\n\t"
" jz 1b\n\t"
- " bsf %3,%0\n\t"
+ " rep; bsf %3,%0\n\t"
" lea -"STR(BITS_PER_LONG/8)"(%2),%2\n\t"
"2: sub %%ebx,%%edi\n\t"
" shl $3,%%edi\n\t"
return VPIC_PRIO_NONE;
/* prio = ffs(mask ROR vpic->priority_add); */
- asm ( "ror %%cl,%b1 ; bsf %1,%0"
+ asm ( "ror %%cl,%b1 ; rep; bsf %1,%0"
: "=r" (prio) : "q" ((uint32_t)mask), "c" (vpic->priority_add) );
return prio;
}
*/
static inline unsigned int find_first_set_bit(unsigned long word)
{
- asm ( "bsf %1,%0" : "=r" (word) : "r" (word) );
+ asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
return (unsigned int)word;
}